® 



Europaisches Patentamt 
European Patent OfFice 
Office europeen des brevets 




(u) Publication number : 0 682 115 A1 



EUROPEAN PATENT APPLICATION 



Application number: 95201374.6 
(22) Date of filing : 07.09.89 



@ Int. ci.^: C12N 15/32, C07K 14/325, 
C12N 15/82. A01N 63/00 



This application was filed on 24 - 05 - 1995 as a 
divisional application to the application 
mentioned under INID code 60. 



(30) Priority : 09.09.88 US 242482 



(43) Date of publication of application 
15.11.95 Bulletin 95/46 



@) Publication number of the eariier application in 
accordance with Art. 76 EPC : 0 359 472 



(84) Designated Contracting States : 

AT BE CH DE ES FR GB GR IT LI LU NL SE 



Inventor : Adang, Michael J. 
160 Tamarack Drive 
Athens, Georgia 30605 (US) 
Inventor : Rocheleau, Thomas A. 
3100 Buena Vista Street 
Madison. Wisconsin 53704 (US) 
Inventor : Mario, Donald J. 
601 Hampden Court 
Midland. Michigan 48640 (US) 
Inventor : Murray. Elizabeth E. 
2426 Commonwealth Avenue 
Madison. Wisconsin 53711 (US) 



@) Representative : Fisher. Adrian John 
CARPMAELS & RANSFORD 
43 Bloomsbury Square 
London WC1A 2RA (GB) 



@ Applicant: Mycogen Plant Science. Inc. 
1209 Orange Street 
Wilmington Delaware 19801 (US) 



@ Synthetic insecticidal crystal protein gene. 

(St) Synthetic Bacillus tiiuringlensis toxin genes designed to be expressed in plants at a level higher than 
naturally-occurring Bt genes are provided. These genes utilize codons preferred in highly expressed 
monocot or dicot proteins. 



< 

in 



CM 

oo 



0. 

LU 



Jouve. 18. rue Sainl-Oenis. 75001 PARIS 



BI^OOCI0:<EP 0682U5A1> 



EP 0 682 115 A1 



FIELD OF THE INVENTION 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



This invention relates to the field of bacterial molecular biologv and. in particular to genetic -ngineerirg 
by recombinant technology for the purpose of protecting plants from insect pests. Disclosed herein are the 
chemical synthesis of a modified crystal protein gene from Bacillus thuringiensis var. tenebrionis (Btt) and 
the selective expression of this synthetic insecticidal gene. Also disclosed Is the transfer of the cloned s^thetic 
gene into a host microorganism, rendering the organism capable of producing, at improved levels of expression 
a protein having toxicity to insects. This invention facilitates the genetic engineering of bacteria and plants to 
attain desired expression levels of novel toxins having agronomic value. 

BACKGROUND OF THE INVENTION 

i thurinqiensis (Bt) is unique in its ability to produce, during the process of sporulation, proteinaceous 
crys a line inclusions which are found to be highly toxic to several insect pests of agricultural importance The 
crystal proteins of different Bt strains have a rather narrow host range and hence are used commercially as 
very selective biological insecticides. Numerous strains of Bt are toxic to lepidopteran and dipteran insects 
Recently two subspecies (or varieties) of Bt have been reported to be pathogenic to coleopteran insects- var 
tenebrionis (Krieg et al. (1983) Z. Angew. Entomol. 96:500-508) and var. san diego (Herrnstadt et al. (1986) 
BiotechnoL 4:305-308). Both strains produce flat, rectangular crystal inclusions and have a major ^yltal com- 
ponent of 64-68 kDa (Herrnstadt et al. supra; Bernhard (1986) FEMS Microbiol. Lett. 33-261-265) 

Toxin genes from several subspecies of Bt have been cloned and the recombinant clones were found to 
be toxic to lepidopteran and dipteran insect larvae. The two coleopteran-active toxin genes have also been 
isolated and expressed. Herrnstadt et al. syera cloned a 5.8 kb BamHI fragment of Bt var. san diego DNA The 
IT^^ e^.P'iessed in E. con was toxic to P. luteola (Elm leaf beetle) and had a molecular wei^of approximately 
83 kOa. This 83 kDa toxin product from the var. san diego gene was larger than the 64 kDa crystal protein 
isolated from Bt var. san diego cells, suggesting that the Bt var. san diego crystal protein may be synthesized 
as a larger precursor molecule that is processed by Bt var. san djego but not by E. coll prior to being formed 
into a crystal. 

frfo"^'-- USA 84:7036-7040; U.S. Patent Application 108,285, filed October 

1 3, 1 987 isolated the crystal protein gene from Btt and determined the nucleotide sequence. This crystal pro- 
tein gene was conteined on a 5.9 kb BamHI fragment (pNSBF544). A subclone containing the 3 kb Hindlll frag- 
ment from PNSBF544 was constructed. This Hindlll fragment contains an open reading frame (ORFMhat en- 
codes a 644-amino acid polypeptide of approximately 73 kDa. Extracts of both subclones exhibited toxicity to 
larvae of Colorado potato beetle (Leptinotarsa decemlineata . a coleopteran insect). 73- and 65-kDa peptides 
that cross-reacted with an antiserum against the crystal protein of var. tenebrionis were produced on expres- 
sion in E. coh. Sporulating var. tenebrionis cells contain an immunoreactive 73-kDa peptide that corresponds 
to the expected product from the ORF of pNSBP544. However, Isolated crystals primarily contain a 65-kDa 
component. When the crystal protein gene was shortened at the N-terminal region, the dominant protein prod- 
uct obtained was the 65-kDa peptide. A deletion derivative, p544Pst-Met5, was enzymatically derived from the 
5 9 kb BamHI fragment upon removal of forty-six amino acid residues from the N-tenninus. Expression of the 
N-terminal deletion derivative. p544Pst-Met5, resulted in the production of. almost exclusively, the 65 kDa pro- 
tein. Recently. McPherson et al. (1988) Biotechnology 6:61-66 demonstrated that the Btt gene contains two 
functional translational initiation codons in the same reading frame leading to the production of both the full- 
length protein and an N-temiinal truncated form. 

Chimeric toxin genes from several strains of Bt have been expressed in plants. Four modified Bt2 genes 
from var. berliner 1715, under the control of the 2' promoter of the Agrobacterium TR-DNA, were transferred 
into tobacco plants (Vaeck et al- (1987) Nature 328:33-37). Insecticidal levels of toxin were produced when 
truncated genes were expressed in transgenic plants. However, the steady state mRNAIevels in the transgenic 
plants were so low that they could not be reliably detected in Northern blot analysis and hence were quantified 
using ribonudease protection experiments. Bt mRNAIevels in plants producing the highest level of protein cor- 
responded to =r 0.0001 % of the poly(A)* mRNA. 

In the report by Vaeck et al. (1987) sugra. expression of chimeric genes containing the entire coding se- 
quence of Bt2 were compared to those containing truncated Bt2 genes. Additionally, some T-DNA constructs 
included a chimeric NPTII gene as a marker selectable in plants, whereas other constructs carried translational 
fusions between fragments of Bt2 and the NPTII gene. Insecticidal levels of toxin were produced when trun- 
cated B^ genes or fusion constructs were expressed in transgenic plants. Greenhouse grown plants produced 
«0.02% of the total soluble protein as the toxin, or 3ng of toxin per g. fresh leaf tissue and. even at five-fold 
lower levels, showed 100% mortality in six-day feeding assays. However, no significant insecticidal activity 
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could be obtained using the intact coding sequence, despite the fact that the same promoter was used to 
direct its expression. Intact Bt2 protein and RNA yields in the transgenic plant leaves were 10 - 50 times lower 
than those for the truncated Bt2 polypeptides or fusion proteins. 

Barton et §[. (1987) Plant Physiol. 85:1103-1109 showed expression of a Bt protein in a system containing 

5 a 35S promoter a viral (TMV) leader sequence, the Bt HD-1 4.5 kb gene (encoding a 645 amino acid protein 
followed by two proline residues) and a nopaline synthase (nos) poly(A)+ sequence. Under these conditions 
expression was observed for Bt mRNA at levels up to 47 pg/20|ag RNA and 1 2 ng/mg plant protein. This amount 
of Bt protein in plant tissue prejduced 100% mortality in two days. This level of expression still represents a 
low level of mRNA (2.5 X 10^%) and protein (1 .2 X 10-3%). 

to Various hybrid proteins consisting of N-tenminal fragments of increasing length of the BC protein fused to 

NPTII were produced in E. coli by Hofte et a[. (1988) FEBS Lett. 226:364-370. Fusion proteins containing the 
first 607 amino acids of Bt2 exhibited insect toxicity; fusion proteins not containing this minimum N-terminal 
fragment were nontoxic. Appearance of NPTII activity was not dependent upon the presence of insecticidal 
activity; however the conformation of the Bt2 polypeptide appeared to exert an important influence on the en- 

f 5 zymatic activity of the fused NPTII protein. This study did suggest that the global 3-D structure of the Bt2 poly- 
peptide is disturbed in truncated polypeptides. 

A number of researchers have attempted to express plant genes in yeast (Neill et al. (1987) Gene 55:303- 
317; Rothstein et al, (1987) Gene 55:353-356; Coraggio et ai. (1986) EMBO J. 5:459-465) and E. coli (Fuza- 
kawa et al. (1987) FEBS Lett. 224:125-127; Vies et al. (1986) EMBO J. 5:2439-2444; Gatenby et al. (1987) 

20 Eur. J. Biochem. 168:227-231). In the case of wheat a-gliadin (Neill etal. (1987) supra), a-amylase (Rothstein 
et a[. (1 987) supra) genes, and maize zein genes (Coraggio eta[. (1 986) supra) in yeast, low levels of expression 
have been reported. Neill et a[. have suggested that the low levels of expression of a-gliadin in yeast may be 
due in part to codon usage bias, since a-gliadin codons for Phe, Leu, Ser, Gly, Tyr and especially Glu do not 
correlate well with the abundant yeast isoacceptor tRNAs. In E. coli however, soybean glycinin A2 (Fuzakawa 

25 etal. (1987) su2i^ and wheat RuBPCSSU (Vies etal. (1986) sue^ 
adequately. 

Not much is known about the makeup of tRNA populations in plants. Viotti et a[. (1978) Biochim. Biophys. 
Acta 517:125-132 report that maize endosperm actively synthesizing zein, a storage protein rich in glutamine. 
leucine, and alanine, is characterized by higher levels of accepting activity for these three amino acids than 
30 are maize embryo tRNAs. This may indicate that the tRNA population of specific plant tissues may be adapted 
for optimum translation of highly expressed proteins such as zein. To our knowledge, no one has experimentally 
altered codon bias in highly expressed plant genes to determine possible effects of the protein translation in 
plants to check the effects on the level of expression. 

35 SUMMARY OF THE INVENTION 

It is the overall object of the present invention to provide a means for plant protection against insect dam- 
age. The invention disclosed herein comprises a chemically synthesized gene encoding an insecticidal protein 
which is functionally equivalent to a native insecticidal protein of Bt. This synthetic gene is designed to be ex- 

40 pressed in plants at a level higher than a native Bt gene. It is preferred that the synthetic gene be designed to 
be highly expressed in plants as defined herein. Preferably, the synthetic gene is at least approximately 85% 
homologous to an insecticidal protein gene of Bt. 

It is a particular object of this invention to provide a synthetic structural gene coding for an insecticidal 
protein from BU having, for example, the nucleotide sequences presented in Figure 1 and spanning nucleotides 

45 1 through 1793 or spanning nucleotide 1 through 1833 with functional equivalence. 

In designing synthetic BU genes of this invention for enhanced expression in plants, the DNA sequence 
of the native Btt structural gene is modified in order to contain codons prefenred by highly expressed plant 
genes, to attain an A+T content in nucleotide base composition substantially that found in plants, and also pre- 
ferably to form a plant initiation sequence, and to eliminate sequences that cause destabilization. inappropriate 

50 polyadenylation, degradation and termination of RNA and to avoid sequences that constitute secondary struc- 
ture hairpins and RNA splice sites. In the synthetic genes, codons used to specify a given amino acid are se- 
lected with regard to the distribution frequency of codon usage employed in highly expressed plant genes to 
specify that amino acid. As is appreciated by those skilled in the art the distribution frequency of codon usage 
utilized in the synthetic gene is a determinant of the level of expression. Hence, the synthetic gene is designed 

55 such that its distribution frequency of codon usage deviates, preferably, no more than 25% from that of highly 
expressed plant genes and. more preferably, no more than about 10%. In addition, consideration is given to 
the percentage G+C content of the degenerate third base (monocotyledons appear to favor G+C in this posi- 
tion, whereas dicotyledons do not). It is also recognized that the XCG nucleotide is the least preferred codon 

3 
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in dicots whereas the XTA codon is avoided in both monocots and dicots. The synthetic genes of this invention 
also preferably have CG and TA doublet avoidance indices as defined in the Detailed Description closely ap- 
proximating those of the chosen host plant. More preferably these indices deviate from that of the host by no 
more than about 10-15%. 

5 Assembly of the Bt gene of this invention Is performed using standard technology known to the art The 

Btt structural gene designed for enhanced expression in plants of the specific embodiment is enzymatically 
assembled within a DNA vector from chemically synthesized oligonucleotide duplex segments. The synthetic 
Bt gene is then introduced into a plant host cell and expressed by means known to the art. The insecticidal 
protein produced upon expression of the synthetic Bt gene in plants is functionally equivalent to a native Bt 

10 crystal protein in having toxicity to the same insects. ~ 

BRIEF DESCRIPTION OF THE FIGURES 
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Figure 1 presents the nucleotide sequence for the synthetic Btt gene. Where different, the native sequence 
as found in p544Pst-Met5 is shown above. Changes in amino acids (underlined) occur in the synthetic se- 
quence with alanine replacing threonine at residue 2 and leucine replacing the stop at residue 596 followed by 
the addition of 13-amino acids at the C-terminus. 

Figure 2 represents a simplified scheme used in the construction of the synthetic Btt gene. Segments A 
through M represent oligonucleotide pieces annealed and ligated together to form DNA duplexes having unique 
splice sites to allow specific enzymatic assembly of the DNA segments to give the desired gene. 

Figure 3 is a schematic diagram showing the assembly of oligonucleotide segments In the construction 
of a synthetic Btt gene. Each segment (A through M) is built from oligonucleotides of different sizes, annealed 
and ligated to form the desired DNA segment. 

25 DETAILED DESCRIPTION OF THE INVENTION 

The following definitions are provided in order to provide clarity as to the intent or scope of their usage in 
the Specification and claims. 

Expression refers to the transcription and translation of a structural gene to yield the encoded protein 
The synthetic Bt genes of the present invention are designed to be expressed at a higher level In plants than 
the corresponding native Bt genes. As will be appreciated by those skilled in the art, structural gene expression 
levels are affected by the regulatory DNA sequences (promoter, polyadenyiation sites, enhancers, etc ) em- 
ployed and by the host cell in which the structural gene is expressed. Comparisons of synthetic Bt gene ex- 
pression and native Bt gene expression must be made employing analogous regulatory sequences and in the 
same host cell. It will also be apparent that analogous means of assessing gene expression must be employed 
in such comparisons. 

^^<^^oier refers to the nucleotide sequences at the 5' end of a structural gene which direct the initiation 
of transcription. Promoter sequences are necessary, but not always sufficient, to drive the expression of a 
downstream gene. In prokaryotes, the promoter drives transcription by providing binding sites to RNA poly- 
merases and other initiation and activation factors. Usually promoters drive transcription preferentially in the 
downstream direction, although promotional activity can be demonstrated (at a reduced level of expression) 
when the gene is placed upstream of the promoter. The level of transcription is regulated by promoter sequenc- 
es. Thus. In the construction of heterologous promoter/structural gene combinations, the structural gene is 
placed under the regulatory control of a promoter such that the expression of the gene is controlled by promoter 
sequences. The promoter is positioned preferentially upstream to the structural gene and at a distance from 
the transcription start site that approximates the distance between the promoter and the gene it controls in its 
natural setting. As is known in the art some variation in this distance can be tolerated without loss of promoter 
function. 

A gene refers to the entire DNA portion involved in the synthesis of a protein. A gene embodies the struc- 
tural or coding portion which begins at the 5' end from the translational start codon (usually ATG) and extends 
to the stop (TAG. TGA or TAA) codon at the 3' end. It also contains a promoter regton. usually located 5' or 
upstream to the structural gene, which initiates and regulates the expression of a structural gene. Also included 
in a gene are the 3* end and poly(A)+ addition sequences. 

Structural gene is that portion of a gene comprising a DNA segment encoding a protein, polypeptide or a 
portion thereof, and excluding the 5* sequence which drives the initiation of transcription. The structural gene 
may be one which is normally found in the cell or one which is not normally found in the cellular location wherein 
it is introduced, in which case it is termed a heterologous gene . A heterologous gene may be derived in whole 
or in part from any source know to the art, including a bacterial genome or episome, eukaryotic. nuclear or 



40 



45 



50 



55 



8NSDOCID: <EP 0682n5A1> 



EP 0 682 115 A1 

plasmid DNA. cDNA, viral DNA or chemically synthesized DNA. A structural gene nnay contain one or more 
modifications in either the coding or the untranslated regions which could affect the biological activity or the 
chemical structure of the expression product, the rate of expression or the manner of expression control. Such 
modifications include, but are not limited to. mutations, insertions, deletions and substitutions of one or more 
5 nucleotides. The structural gene may constitute an unintenrupted coding sequence or it may include one or 
more introns, bounded by the appropriate splice junctions. The structural gene may be a composite of seg- 
ments derived from a plurality of sources, naturally occurring or synthetic. The structural gene may also en- 
code a fusion protein. 

Synthetic gene refers to a DNA sequence of a structural gene that is chemically synthesized in its entirety 
10 or for the greater part of the coding region. As exemplified herein, oligonucleotide building blocks are synthe- 
sized using procedures known to those skilled in the art and are ligated and annealed to fomn gene segments 
which are then enzymatically assembled to construct the entire gene. As is recognized by those skilled in the 
art, functionally and structurally equivalent genes to the synthetic genes described herein may be prepared 
by site-specific mutagenesis or other related methods used in the art. 
75 Transforming refers to stably introducing a DNA segment carrying a functional gene into an organism that 

did not previously contain that gene. 

Plant tissue includes differentiated and undifferentiated tissues of plants, including but not limited to, roots, 
shoots, leaves, pollen, seeds, tumor tissue and various forms of cells in culture, such as single cells, proto- 
plasts, embryos and callus tissue. The plant tissue may be in planta or in organ, tissue or cell culture. 
20 Plant cell as used herein includes plant cells in planta and plant cells and protoplasts in culture. 

Homology refers to identity or near identity of nucleotide or amino acid sequences. As is understood in 
the art, nucleotide mismatches can occur at the third or wobble base in the codon without causing amino acid 
substitutions in the final polypeptide sequence. Also, minor nucleotide modifications (e.g., substitutions, in- 
sertions or deletions) in certain regions of the gene sequence can be tolerated and considered insignificant 
25 whenever such modifications result in changes in amino acid sequence that do not alter functionality of the 
final product. It has been shown that chemically synthesized copies of whole, or parts of. gene sequences can 
replace the corresponding regions in the natural gene without loss of gene function. Homologs of specific DNA 
sequences may be identified by those skilled in the art using the test of cross-hybridization of nucleic acids 
under conditions of stringency as is well understood in the art (as described in Hames and Higgens (eds.) 
30 (1985) Nucleic Acid Hybridization . IRL Press, Oxford. UK). Extent of homology is often measured in terms of 
percentage of identity between the sequences compared. 

Functionally equivalent refers to identity or near identity of function. A synthetic gene product which is toxic 
to at least one of the same insect species as a natural Bt protein is considered functionally equivalent thereto. 
As exemplified herein, both natural and synthetic Btt genes encode 65 kDa. insecticidal proteins having es- 
35 sentially identical amino acid sequences and having toxicity to coleopteran insects. The synthetic Bt genes of 
the present invention are not considered to be functionally equivalent to native Bt genes, since they are ex- 
pressible at a higher level in plants than native Bt genes. 

Frequency of preferred codon usage refers to the preference exhibited by a specific host cell in usage of 
nucleotide codons to specify a given amino acid. To detenmine the frequency of usage of a particular codon 
40 in a gene, the number of occunrences of that codon in the gene is divided by the total number of occurrences 
of all codons specifying the same amino acid in the gene. Table 1, for example, gives the frequency of codon 
usage for Bt genes, which was obtained by analysis of four Bt genes whose sequences are publicly available. 
Similarly, the frequency of preferred codon usage exhibited by a host cell can be calculated by averaging fre- 
quency of preferred codon usage in a large number of genes expressed by the host cell. It is preferable that 
45 this analysis be limited to genes that are highly expressed by the host cell. Table 1, for example, gives the fre- 
quency of codon usage by highly expressed genes exhibited by dicotyledonous plants, and monocotyledonous 
plants. The dicot codon usage was calculated using 154 highly expressed coding sequences obtained from 
Genbank which are listed in Table 1. Monocot codon usage was calculated using 53 monocot nudear gene 
coding sequences obtained from Genbank and listed in Table 1. located in Example 1. 
50 When synthesizing a gene for improved expression in a host cell it is desirable to design the gene such 

that its frequency of codon usage approaches the frequency of preferred codon usage of the host cell. 

The percent deviation of the frequency of preferred codon usage for a synthetic gene from that employed 
by a host cell is calculated first by determining the percent deviation of the frequency of usage of a single codon 
from that of the host cell followed by obtaining the average deviation over all codons. As defined herein this 
55 calculation includes unique codons (i.e., ATG and TGG). The frequency of preferred codon usage of the syn- 
thetic Btt gene, whose sequence is given in Figure 1, is given in Table 1 . The frequency of preferred usage of 
the codon 'GTA for valine in the synthetic gene (0.10) deviates from that preferred by dicots (0.1 2) by 0.02/0.1 2 
= 0.167 or 16.7%. The average deviation over all amino acid codons of the Btt synthetic gene codon usage 
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from that of dicot plants is 7.8%. In general terms the overall average deviation of the codon usage of a syn- 
thetic gene from that of a host cell is calculated using the equation 

y X 100 

n = 1 - Z ^ 

where X„ = frequency of usage for codon n in the host cell; Y„ = frequency of usage for codon n In the synthetic 
gene. Where n represents an individual codon that specifies an amino acid, the total number of codons is Z 
which in the preferred embodiment is 61 . The overall deviation of the frequency of codon usage for all amino 
acids should preferably be less than about 25%, and more preferably less than about 10% 

Derived from is used to mean taken, obtained, received, traced, replicated or descended from a source 
(chemical and/or biological). A derivative may be produced by chemical or biological manipulation (including 
but not limited to substitution, addition, insertion, deletion, extraction, isolation, mutation and replication) of 
the original source. 

Chemically synthesized , as related to a sequence of DNA, means that the component nucleotides were 
assembled m vjlro. Manual chemical synthesis of DNAmay be accomplished using well established procedures 
(Caruthers, M. (1983) in Methodology of DNA and RNA Sequencing Weissman (ed.). Praeger Publishers New 
York. Chapter 1), or automated chemical synthesis can be performed using one of a number of commercially 
available machines. ' 

designed to be hiqhiv expressed as used herein refers to a level of expression of a designed 
gene wherein the amount of its specific mRNA transcripts produced is sufficient to be quantified in Northern 
blots and. thus, represents a level of specific mRNA expressed corresponding to greater than or equal to ap- 
proximately 0.001% of the poly(A)+ mRNA. To date, natural Bt genes are transcribed at a level wherein the 
amount of specific mRNA produced is insufficient to be estimated using the Northern blot technique However 
in the present invention, transcription of a synthetic Bt gene designed to be highly expressed not only allows 
quantification of the specific mRNA transcripts produced but also results in enhanced expression of the trans- 
lation product which is measured in insecticidal bioassays. 

Crystal protein or insecticidal crystal protein or crystal toxin refers to the major protein component of the 
parasporal crystals formed in strains of Bt. This protein component exhibits selective pathogenicity to different 
species of insects. The molecular size of the major protein isolated from parasporal crystals varies depending 
on the strain of Bt from which it is derived. Crystal proteins having molecular weights of approximately 132 
65, and 28 kDa have been reported. It has been shown that the approximately 132 kDa protein is a protoxin 
that is cleaved to form an approximately 65 kDa toxin. 

The crystal protein gene refers to the DNA sequence encoding the insecticidal crystal protein in either full 
length protoxin or toxin form, depending on the strain of Bt from which the gene is derived. 

The authors of this invention observed that expression in plants of Bt crystal protein mRNA occurs at levels 
that are not routinely detectable in Northern blots and that low levels of Bt crystal protein expression correspond 
to this low level of mRNAexpression. It is preferred for exploitation of these genes as potential biocontrol meth- 
ods that the level of expression of Bt genes in plant cells be improved and that the stability of Bt mRNA in plants 
be optimized. This will allow greater levels of Bt mRNA to accumulate and will result in an increase in the amount 
of insecticidal protein in plant tissues. This is essential for the control of insects that are relatively resistant to 
Bt protein. 

Thus, this invention is based on the recognition that expression levels of desired, recombinant insecticidal 
protein in transgenic plants can be improved via increased expression of stabilized mRNA transcripts- and that 
conversely, detection of these stabilized RNA transcripts may be utilized to measure expression of transla- 
tional product (protein). This invention provides a means of resolving the problem of low expressfon of insec- 
ticidal protein RNA in plants and, therefore, of low protein expression through the use of an improved, synthetic 
gene specifying an Insecticidal crystal protein from Bt. 

Attempts to improve the levels of expression of Bt genes in plants have centered on comparative studies 
evaluating parameters such as gene type, gene length, choice of promoters, additkin of plant viral untranslated 
RNA leader, addition of intron sequence and modification of nucleotides surrounding the initiation ATG codon. 
To date, changes in these parameters have not led to significant enhancement of Bt protein expression in 
plants. Applicants find that, surprisingly, to express Bt proteins at the desired level in plants, modiflcatkjns in 
the coding region of the gene were effective. Structural-function relationships can be studied using site- 
specific mutagenesis by replacement of restriction fragments with synthetic DNA duplexes containing the de- 
sired nucleotide changes (Lo et al. (1984) Proc. Natl. Acad. Sci. 81:2285-2289). However, recent advances in 
recombinant DNA technology now make it feasible to chemically synthesize an entire gene designed specifi- 
cally for a desired function. Thus, the Btt coding region was chemically synthesized, modified in such a way 
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as to improve its expression In plants. Also, gene synthesis provides the opportunity to design the gene so 
as to facilitate its subsequent mutagenesis by incorporating a number of appropriately positioned restriction 
endonudease sites into the gene. 

The present invention provides a synthetic 8t gene for a crystal protein toxic to an insect. As exemplified 

5 herein, this protein is toxic to coleopteran insects, the end of improving expression of this insecticidal protein 
in plants, this invention provides a DMA segment homologous to a Btt structural gene and. as exemplified here- 
in, having approximately 85% homology to the Btt structural gene in p544Pst-Met5. In this embodiment the 
structural gene encoding a Btt insecticidal protein is obtained through chemical synthesis of the coding region, 
A chemically synthesized gene is used in this embodiment because it best allows for easy and efficacious ac- 

to commodation of modifications in nucleotide sequences required to achieve improved levels of cross- 
expression. 

Today, in general, chemical synthesis is a preferred method to obtain a desired modified gene. However, 
to date, no plant protein gene has been chemically synthesized nor has any synthetic gene for a bacterial pro- 
tein been expressed in plants. In this invention, the approach adopted for synthesizing the gene consists of 
15 designing an improved nucleotide sequence for the coding region and assembling the gene from chemically 
synthesized oligonucleotide segments. In designing the gene, the coding region of the naturally-occurring 
gene, preferably from the Btt subclone, p544Pst-Met5. encoding a 65 kDa polypeptide having coleoperan toxi- 
city, is scanned for possible modifications which would result in improved expression of the synthetic gene in 
plants. For example, to optimize the efficiency of translation, codons preferred in highly expressed proteins 
20 of the host cell are utilized. 

Bias in codon choice within genes in a single species appears related to the level of expression of the pro- 
tein encoded by that gene. Codon bias is most extreme in highly expressed proteins of E. coli and yeast. In 
these organisms, a strong positive conrelation has been reported between the abundance of an isoaccepting 
tRNA species and the favored synonymous codon. In one group of highly expressed proteins in yeast, over 
25 96% of the amino acids are encoded by only 25 of the 61 available codons (Bennetzen and Hall (1982) J. Biol. 
Chem. 257:3026-3031). These 25 codons are preferred in all sequenced yeast genes, but the degree of pref- 
erence varies with the level of expression of the genes. Recently, Hoekema and colleagues(1987) Mol. Cell. 
Biol. 7:2914-2924 reported that replacement of these 25 preferred codons by minor codons in the 5' end of 
the highly expressed yeast gene PGK1 results in a decreased level of both protein and mRNA. They concluded 
30 that biased codon choice in highly expressed genes enhances translation and is required for maintaining mRNA 
stability in yeast. Without doubt, the degree of codon bias is an important factor to consider when engineering 
high expression of heterologous genes in yeast and other systems. 

Experimental evidence obtained from point mutations and deletion analysis has indicated that in eukary- 
otic genes specific sequences are associated with post-transcriptional processing, RNAdestabilization, trans- 
35 lational termination, intron splicing and the like. These are preferably employed in the synthetic genes of this 
invention. In designing a bacterial gene for expression in plants, sequences which interfere with the efficacy 
of gene expression are eliminated. 

In designing a synthetic gene, modifications in nucleotide sequence of the coding region are made to mod- 
ify the A+T content in DNA base composition of the synthetic gene to reflect that normally found in genes for 
40 highly expressed proteins native to the host cell. Preferably the A+T content of the synthetic gene is substan- 
tially equal to that of said genes for highly expressed proteins. In genes encoding highly expressed plant pro- 
teins, the A+T content is approximately 55%. It is preferred that the synthetic gene have an A+T content near 
this value, and not sufficiently high as to cause destabilization of RNA and, therefore, lower the protein ex- 
pression levels. More preferably, the A+T content is no more than about 60% and most preferably is about 55%. 
45 Also, for ultimate expression in plants, the synthetic gene nucleotide sequence is preferably modified to form 
a plant initiation sequence at the 5* end of the coding region. In addition, particular attention is preferably given 
to assure that unique restriction sites are placed in strategic positions to allow efficient assembly of oligonu- 
cleotide segments during construction of the synthetic gene and to facilitate subsequent nucleotide modifica- 
tion. As a result of these modifications in coding region of the native Bt gene, the preferred synthetic gene is 
50 expressed in plants at an enhanced level when compared to that observed with natural Bt structural genes. 

In specific embodiments, the synthetic Bt gene of this invention encodes a Btt protein toxic to coleopteran 
insects. Preferably, the toxic polypeptide is about 598 amino acids in length, is at least 75% homologous to a 
Btt polypeptide, and. as exemplified herein, is essentially identical to the protein encoded by p544Pst-Met5, 
except for replacement of threonine by alanine at residue 2. This amino acid substitution results as a conse- 
55 quence of the necessity to introduce a guanine base at position +4 in the coding sequence. 

In designing the synthetic gene of this invention, the coding region from the Btt subclone, p544Pst-Met5, 
encoding a 65 kDa polypeptide having coleopteran toxicity, is scanned for possible modifications which would 
result in improved expression of the synthetic gene in plants. For example, in preferred embodiments, the syn- 
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thetic insecticidal protein is strongly expressed in dicot plants, e.g., tobacco, tomato, cotton, etc and hence 
a synthefc gene under these conditions is designed to incorporate to advantage codons used pre^eren^^^^^^^ 
by h,ghly expressed d.cot proteins. In embodiments where enhanced expression of inc.c^ic^dal ^roteMs da 

:":^:™;::;„rr;.:r°' ^ '^^ '"--^ »- ~ 

In general, genes within a taxonomic group exhibit similarities in codon choice, regardless of the function 
of these genes. Thus an estimate of the overall use of the genetic code by a taxonomic group can be obtained 

this inZ;" T" '^T'"""' °' '"''"'"'^^^ 3""^'- ■^•^'^ species-specific codon choice is reporteSIn 
this invention from analysis of 208 plant genes. Both monocot and dicot plants are analyzed indiviSy o 
de^rmme Whether these broader taxonomic groups are characterized by different patterns of synCr^ous 
codon preference The 208 plant genes included in the codon analysis code for proteins having a wTe range 
of functions and they represent 6 monocot and 36 dicot species. These proteins are present in dfffe entrant 
tissues at varying levels of expression. M-eseni m aiTierent piant 

and 1' dfclri^n ' "I'l" °' synonymous codons differs between the monocots 
and the dicots. In general, the most important factor in discriminating between monocot and dicot patterns of 

favor g'Ic': ^ T"'"^^ "^'^^ — °'s- of 18 amino adds 

favor G+C in this position, while dicots only favor G+C in 7 of 1 8 amino acids 

rnntll^^- ^"''r^ '?.''^^'- ^'^ ^"'^ monocots and dicots because they 

Evo 9 ir^'JsaT 'TT "■ T «'~"9'y avoided in plants (Boudraa (1987) Genet. S 

IrrJ !' (Grantham et al. (1985) Bull. Inst Pasteur 83:95-148), possibly due 

TrSTZTT".^ T^'tT""\ ''^^ '''^"'^ codonTwhile in monocots this 

^f bri^^^^ot^^^^^^^^^^^^^ ^^"'''^^ " -^^'Vctes. and this is true 

Grantham and colleagues(1986) Oxford Surveys in Evol. Biol. 3:48-81 have developed two codon choice 
ar;i°as"basL n ^f^^"*^;^ --"'"'^ " ^"'^ "'• XCG/XCC is the ratio of cJio" 

S T as^he f.^n H T l""'"^ ^'"^ ''■'^^"^ ^''^ °^ A-«"^'"g to T-ending triplets 

r. nnL?h ? T ^'^'^ '"^"^^ ^^"^ ''^^^ '^^"^"'^'^^ P'«"t data in this paper (Table 2) and 

support the conclusion that monocot and dicot species differ in their use of these dinucleotides 



Table 2 



io?/?^!^''^ doublets in codons position Il-lir. 

XCG/XCC and XTA/XAA values are multiplied by lOO. 



Group 


Plants 


Dicots 


Mono- 
cots 


Maize 


Soy- 
bean 


RuBPC 
SSU 


CAB 


XCG/XCC 
XTA/XTT 


40 
37 


30 
35 


61 
47 


67 
43 


3 7 
41 


18 
9 


22 
13 



RuBPC SSU = ribulose 1,5 bisphosphate snail subunit 
CAB = chlorophyll a/b binding protein 



50 



55 



.nn^th? T tk' species-specific codon usage profiles were calculated 

e« ««Tn ! "l^'fl"^"^" "s^se pattern resembles that of monocots in general, since these sequences 
rl T K °V sequences available. The codon profile of the maize subsample is even 

r/oo f preference for G+C in codon position III. On the other hand, the soybean codon 

J^Sa l almost Identical to the general dicot pattern, even though it represents a much smaller portion 
of the entire dicot sample. 

In order to determine whether the coding strategy of highly expressed genes such as the ribulose 1 5 bi- 
sphosphate small subunit (RuBPC SSU) and chlorophyll a/b binding protein (CAB) is more biased than that of 
plant genes in general, codon usage profiles for subsets of these genes (19 and 17 sequences, respectively) 
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were calculated (not shown). The RuBPC SSU and CAB pooled sannples are characterized by stronger avoid- 
ance of the codons XCG and XTA than in the larger monocot and dicot samples (Table 2). Although most of 
the genes In these subsamples are dicot in origin (17/19 and 15/17), their codon profile resembles that of the 
monocots in that G+C is utilized in the degenerate base III. 

5 The use of pooled data for highly expressed genes may obscure identification of species-specific patterns 

in codon choice. Therefore, the codon choices of individual genes for RuBPC SSU and CAB were tabulated. 
The preferred codons of the maize and wheat genes for RuBPC SSU and CAB are more restricted in general 
than are those of the dicot species. This is in agreement with Matsuoka et a[. (1 987) J. Biochem. 102:673-676) 
who noted the extreme codon bias of the maize RuBPC SSU gene as well as two other highly expressed genes 

10 in maize leaves, CAB and phosphoenol pyruvate carboxylase. These genes almost completely avoid the use 
of A+T in codon position 111, although this codon bias was not as pronounced in non-leaf proteins such as alcohol 
dehydrogenase, zein 22 kDa sub-unit, sucrose synthetase and ATP/ADP translocator. Since the wheat SSU 
and CAB genes have a similar pattern of codon preference, this may reflect a common monocot pattern for 
these highly expressed genes in leaves. The CAB gene for Lemna and the RuBPC SSU genes for Chlamdo- 

15 monas share a similar extreme preference for G+C in codon position III. In dicot CAB genes, however. A+T 
degenerate bases are preferred by some synonymous codons (e.g., GCT for Ala, CTT for Leu, GGA and GGT 
for Gly). In general, the G+C preference is less pronounced for both RuBPC SSU and CAB genes in dicots 
than in monocots. 

In designing a synthetic gene for expression in plants, attempts are also made to eliminate sequences 

20 which interfere with the efficacy of gene expression. Sequences such as the plant polyadenylation signals, 
e.g., AATAAA, polymerase II termination sequence, e.g., CAN(7.9)AGTNNAA, UCUUCGG hairpins and plant 
consensus splice sites are highlighted and, if present in the native Btt coding sequence, are modified so as 
to eliminate potentially deleterious sequences. 

Modifications in nucleotide sequence of the Btt coding region are also preferably made to reduce the A+T 

25 content in DNA base composition. The Btt coding region has an A+T content of 64%, which is about 10% higher 
than that found in a typical plant coding region. Since A+T-rich regions typify plant intergenic regions and plant 
regulatory regions, it is deemed prudent to reduce the A+T content. The synthetic Btt gene is designed to have 
an A+T content of 55%, in keeping with values usually found in plants. 

Also, a single modification (to introduce guanine in lieu of adenine) at the fourth nucleotide position in the 

30 Btt coding sequence is made in the preferred embodiment to form a sequence consonant with that believed 
to function as a plant initiation sequence (Taylor et §1. (1987) Mol. Gen. Genet. 210:572-577) in optimization 
of expression. In addition, in exemplifying this invention thirty-nine nucleotides(thirteen codons) are added to 
the coding region of the synthetic gene in an attempt to stabilize primary transcripts. However, it appears that 
equally stable transcripts are obtained in the absence of this extension polypeptide containing thirty-nine nu- 

35 cleotides. 

Not all of the above-mentioned modifications of the natural Bt gene must be made in constructing a syn- 
thetic Bt gene in order to obtain enhanced expression. For example, a synthetic gene may be synthesized for 
other purposes in addition to that of achieving enhanced levels of expression. Under these conditions, the orig- 
inal sequence of the natural Bt gene may be preserved within a region of DNA corresponding to one or more, 

40 but not all, segments used to construct the synthetic gene. Depending on the desired purpose of the gene, 
modification may encompass substitution of one or more, but not all, of the oligonucleotide segments used to 
construct the synthetic gene by a corresponding region of natural Bt sequence. 

As is known to those skilled in the art of synthesizing genes (Mandecki et al. (1985) Proc. Natl. Acad. Sci, 
82:3543-3547; Feretti et al. (1986) Proc. Natl. Acad. Sci. 83:599-603), the DNA sequence to be synthesized 

45 is divided into segment lengths which can be synthesized conveniently and without undue complication. As 
exemplified herein, in preparing to synthesize the Btt gene, the coding region is divided into thirteen segments 
(A - M). Each segment has unique restriction sequences at the cohesive ends. Segment A, for example, is 228 
base pairs in length and is constructed from six oligonucleotide sections, each containing approximately 75 
bases. Single-stranded oligonucleotides are annealed and ligated to form DNA segments. The length of the 

50 protruding cohesive ends in complementary oligonucleotide segments is four to five residues. In the strategy 
evolved for gene synthesis, the sites designed for the joining of oligonucleotide pieces and DNA segments are 
different from the restriction sites created in the gene. 

In the specific embodiment, each DNAsegment is cloned into a plC-20 vector for amplification of the DNA. 
The nucleotide sequence of each fragment is determined at this stage by the dideoxy method using the re- 

55 combinant phage DNA as templates and selected synthetic oligonucleotides as primers. 

As exemplified herein and illustrated schematically in Figures 3 and 4, each segment individually (e.g., 
segment M) is excised at the flanking restriction sites from its doning vector and spliced into the vector con- 
taining segment A. Most often, segments are added as a paired segment instead of as a single segment to 
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increase efficiency. Thus, the entire gene is constructed in the original plasmid harboring segment A The nu- 
cleotide sequence of the entire gene is determined and found to correspond exactly to that shown in Figure 

. JH^^!ITl'' ^y"'^^''^ iii gene is expressed in plants at an enhanced level when com- 

5 pared to that observed with natural Btt structural genes. To that end, the synthetic structural gene is combined 
with a promoter functional in plants, the structural gene and the promoter region being in such position and 
onentation with respect to each other that the structural gene can be expressed in a cell in which the promoter 
region is active, thereby forming a functional gene. The promoter regions include, but are not limited to bac- 
tena and plant promoter regions. To express the promoter region/ structural gene combination, the DNA seg- 
10 ment carrying the combination is contained by a cell. Combinations which include plant promoter regions a?e 
contained by plant cells, which, in turn, may be contained by plants or seeds. Combinations which include bac- 
terial promoter regions are contained by bacteria, e.g., Bt or E. coiL Those in the art will recognize that expres- 
sion in types of micro-organisms other than bacteria may in some circumstances be desirable and given the 
present disclosure, feasible without undue experimentation. The recombinant DNA molecule carrying a syn- 

structural gene under promoter control can be introduced into plant tissue by any means known to those 
Skilled in the art. The technique used for a given plant species or specific type of plant tissue depends on the 
known successful techniques. As novel means are developed for the stable insertion of foreign genes into plant 
cells and for manipulating the modified cells, skilled artisans will be able to select from known means to achieve 

20 a desired result Means for introducing recombinant DNA into plant tissue include, but are not limited to. direct 
DNAuptake (Paszkowski, J. et al. (1984) EMBO J. 3:271 7), electroporation (Fromm, M. et al- (1985) Proc. Natl 
Acad. Sci. USA82;5824), microinjection (Crossway, A. et al. (1986) Mol. Gen. Genet. 202:179) orT-DNAmedi- 
ated transfer from Agrobacterium tumefaciens to the plant tissue. There appears to be no fundamental limit- 
ation of T-DNA transformation to the natural host range of Agrobacterium . Successful T-DNA-mediated trans- 
fo^";a ion of monocots (Hooykaas-Van Slogteren, G. et al. (1984) Nature 311:763), gymnosperm (Dandekar 
A. et al. (1987) Biotechnology 5:587) and algae (Ausich. R., EPO application 108,580) has been reported Rep- 
resentative T-DNA vector systems are described in the following references: An, G. et al. (1 985) EMBO J 4277 
Herrera-Estrella, L. et al. (1983) Nature 303:209; Herrera-Estrella, L. et al. (1983) EMBO J 2-987 He'rrera- 
Estrella L. et al. (1985) in Plant Genetic Enqineerinq, New York: Cambridge University Press.'p. 63 Once in- 

>o roduced into the plant tissue, the expression of the structural gene may be assayed by any means known to 
the art and expression may be measured as mRNA transcribed or as protein synthesized. Techniques are 
known for the mvitrg culture of plant tissue, and in a number of cases, for regeneration into whole plants Pro- 
cedures for transferring the introduced expression complex to commercially useful cultivars are known to those 
Skilled in the art. 

In one of its preferred embodiments the invention disclosed herein comprises expression in plant cells of 
a synthetic insecticidal structural gene under control of a plant expressible promoter, that is to say, by inserting 
the msecticide structural gene into T-DNA under control of a plant expressible promoter and introducing the 
T-DNA containing the insert into a plant cell using known means. Once plant cells expressing a synthetic in- 
secticidal structural gene under control of a plant expressible promoter are obtained, plant tissues and whole 
p ants can be regenerated therefrom using methods and techniques well-known in the art. The regenerated 
p ants are then reproduced by conventional means and the introduced genes can be transferred to other 
strains and cultivars by conventional plant breeding techniques. 

The introduction and expression of the synthetic structural gene for an insecticidal protein can be used 
to protect a crop from infestation with common insect pests. Other uses of the invention, exploiting the prop- 
erties of other insecticide structural genes introduced into other plant species will be readily apparent to those 
Skilled in the art The invention in principle applies to introduction of any synthetic insecticide structural gene 
into any plant species into which foreign DNA (in the preferred embodiment T-DNA) can be introduced and in 
which said DNA can remain stably replicated. In general, these taxa presently include, but are not limited to 
gymnosperms and dicotyledonous plants, such as sunflower (family Compositeae), tobacco (family Solana- 
ceae), alfalfa, soybeans and other legumes (family Leguminoseae). cotton (family Malvaceae), and most veg- 
etables, as well as monocotyledonous plants. A plant containing in its tissues increased levels of insecticidal 
protein will control less susceptible types of insect, thus providing advantage over present insecticidal uses 
of Bt By incorporation of the insecticidal protein into the tissues of a plant, the present invention additionally 
provides advantage over present uses of insecticides by eliminating instances of nonuniform application and 
the costs of buying and applying insecticidal preparations to a field. Also, the present invention eliminates the 
need for careful timing of application of such preparations since small larvae are most sensitive to insecticidal 
protein and the protein is always present minimizing crop damage that would otherwise result from preappli- 
cation larval foraging. 
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This invention combines the specific teachings of the present disclosure with a variety of techniques and 
expedients known in the art. The choice of expedients depends on variables such as the choice of insecticidal 
protein from a Bt strain, the extent of modification in preferred codon usage, manipulation of sequences con- 
sidered to be destabilizing to RNA or sequences prematurely terminating transcription, insertions of restriction 

5 sites within the design of the synthetic gene to allow future nucleotide modifications, addition of introns or en- 
hancer sequences to the 5' and/or 3' ends of the synthetic structural gene, the promoter region, the host in 
which a promoter region/structural gene combination is expressed, and the like. As novel insecticidal proteins 
and toxic polypeptides are discovered, and as sequences responsible for enhanced cross-expression (expres- 
sion of a foreign structural gene in a given host) are elucidated, those of ordinary skill will be able to select 

10 among those elements to produce "improved" synthetic genes for desired proteins having agronomic value. 
The fundamental aspect of the present invention is the ability to synthesize a novel gene coding for an insec- 
ticidal protein, designed so that the protein will be expressed at an enhanced level in plants, yet so that it will 
retain its inherent property of insect toxicity and retain or increase its specific insecticidal activity. 

15 EXAMPLES 

The following Examples are presented as illustrations of embodiments of the present invention. They do 
not limit the scope of this invention, which is determined by the claims. 

The following strains were deposited with the Patent Culture Collection, Northern Regional Research Cen- 
20 ter, 1815 N. University Street, Peoria, Illinois 61604. 

Strain Deposited on Accession # 

E. coli MC1061 6 October 1987 NRRL B-18257 

(p544-HindIII) 



E. coll MC1061 6 October 1987 NRRL B-18258 

(p544Pst-Met5) 

30 

The deposited strains are provided for the convenience of those in the art, and are not necessary to practice 
the present invention, which may be practiced with the present disclosure in combination with publicly available 
protocols, information, and materials. E. coli MC1 061, a good host for plasmid transformations, was disclosed 
35 by Casadaban, M.J. and Cohen, S.N. (1980) J. MoL Biol. 138:179-207. 

Example 1 : Design of the synthetic insecticidal crystal protein gene . 

(i) Preparation of toxic subclones of the Btt gene 

40 

Construction, isolation, and characterization of pNSB544 is disclosed by Sekar, V. et al. (1 987) Proc. Natl . 
Acad. Sci. USA 84:7036-7040, and Sekar, V. and Adang, M.J., U.S. patent application serial no. 108,285. filed 
October 13. 1987, which is hereby incorporated by reference. A 3,0 kbp Hindlll fragment carrying the crystal 
protein gene of pNSBP544 is inserted into the Hind lll site of plC-20H (Marsh. J.L. et §[. (1984) Gene 32:481- 
45 485), thereby yielding a plasmid designated p544-Hindlll. which is on deposit Expression in E. coli yields a 
73 kDa crystal protein in addition to the 65 kDa species characteristic of the crystal protein obtained from Btt 
isolates. 

A 5.9 kbp BamHI fragment carrying the crystal protein gene is removed from pNSBP544 and inserted into 
Bam HI-linearized plC-20H DNA. The resulting plasmid, p405/44-7, is digested with Bglll and religated, thereby 
50 removing Bacillus sequences flanking the J-end of the crystal protein gene. The resulting plasmid, p405/54- 
1 2. is digested with Psti and religated, thereby removing Bacillus sequences flanking the 5*-end of the crystal 
protein and about 150 bp from the 5'-end of the crystal protein structural gene. The resulting plasmid, p405/81- 
4, is digested with Sph I and PstI and is mixed with and ligated to a synthetic linker having the following structure: 

55 
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SD MetThrAla 
5 • CAGGATCCAACAATGACTGCA3 
3 ' GTACGTCCTAGGTTGTTACTG5 • 

SJShi Pst i 



10 



IS 



20 

1. 

25 



SIa^ I TV °^ ^ Shine-Dalgarno prokaryotic ribosome binding site.) The resulting plasmid. 

p544Pst-Met5. contains a structural gene encoding a protein identical to one encoded by pNSBP544 except 
for a deletion of the amino-terminal 47 amino acid residues. The nucleotide sequence of the Btt coding region 
'rnS?«ir f 1 presented in Figure 1. In bioassays (Sekar and Adang. U.S. patent applteition serial no. 
108,285, supj^) the proteins encoded by the full-length Btt gene in pNSBP544 and the N-terminal deletion 
derivative. p544Pst-l\^et5, were shown to be equally toxic. All of the plasmids mentioned above have their crys- 
tal protein genes m the same orientation as the lacZ gene of the vector. 

(ii) Modification of preferred codon usage 

Table 1 presents the frequency of codon usage for (A) dicot proteins, (B) Bt proteins. (C) the synthetic Btt gene 
and (D) monocot proteins. Although some codons for a particular amino acid are utilized to approximllely the 
same extent by both dicot and Bt proteins (e.g., the codons for serine), for the most part, the distribution of 
codon frequency varies significantly between dicot and Bt proteins, as illustrated in columns A and B in Table 
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Table 1. 



Frequency of Codon Usage 



Distribution Fraction 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Amino 

Acid Codon 



Gly 
Gly 
Gly 
Gly 

Glu 
Glu 
Asp 
Asp 

Val 
Val 
Val 
Val 

Ala 
Ala 
Ala 
Ala 



GGG 
GGA 

GGT 
GGC 

GAG 
GAA 
GAT 

GAC 

GTG 
GTA 
GTT 
GTC 

GCG 
GCA 
GCT 
GCC 



Lys 


AAG 


Lys 


AAA 


Asn 


AAT 


Asn 


AAC 


Met 


ATG 


He 


ATA 


lie 


ATT 


He 


ATC 


Thr 


ACG 


Thr 


ACA 


Thr 


ACT 


Thr 


ACC 


Trp 


TGG 


End 


TGA 


Cys 


TGT 


Cys 


TGC 


End 


TAG 


End 


TAA 


Tyr 


TAT 


Tyr 


TAC 



(A) Dicot 


(B)Bt 


(C) Synthetic 


(D)Monocot 


Genes 


Genes 


Btt Gene 


Genes 


0. 12 


0-08 


0. 13 


0-21 


0.37 


0.53 


0.37 


0. 18 


0.35 


0.24 


0.34 


U . ^ 1 


0 . 16 


0. 16 


0. 16 


0.40 


0.52 


0. 13 


0.52 


0,77 


0.48 


0.87 


0.48 


0.23 


0 . 57 


0 . 68 


0 • 56 


U , J i 


0.43 


0.32 


0.44 


0.69 


0.30 


0. 15 


0.30 


0. 38 


0. 12 


0,32 


0. 10 


0 . 07 


0 . 38 


0 . 29 


0.35 


0 . 2 U 


0.20 


0. 24 


0.25 


0.34 


0.05 


0. 12 


0. 06 


0.20 


0.26 


0.50 


0.24 


0. 16 


0.42 


0. 32 


0.41 


r\ 'J Q 
U • ^ o 


0, 28 


0.06 


0.29 


0 . 36 


0.61 


0. 13 


0.58 


0.87 


0. 39 


0.87 


0.42 


0.13 


0.45 


0. 79 


0.44 




0 . 55 


0.21 


0.56 


0.77 


1 . 00 


1. 00 


1.00 


1.00 


0 . 19 


0. 30 


0.20 


0. 09 


0.44 


0 • 57 


0 . 4 J 


n 97 
\j • ^ i 


0 . 36 


0. 13 


0.37 


0.64 


0.07 


0, 14 


0. 07 


0. 18 


0.27 


0. 68 


0.27 


0. 14 


0. 36 


0. 14 


0.34 


0.22 


0.31 


0. 05 


0.32 


0.47 


1. 00 


1. 00 


1. 00 


1, 00 


0.46 


0.00 


0. 00 


0.34 


0.43 


0.33 


0.33 


0. 27 


0. 57 


0.67 


0. 67 


0.73 


0. 18 


0.00 


0.00 


0.44 


0. 37 


1.00 


1.00 


0.22 


0.42 


0.81 


0.43 


0. 19 


0.58 


0. 19 


0.57 


0.81 
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Table 1 (CONTINUED) 



Distribution Fraction 



(A)Dicot (B)Bt (C)Synthetic (D)Monocot 
acid Co<3on Genes Genes Btt Gene genes 



Ser TCC 0.19 0.10 

Arg AGG 0.22 0.09 



Arg CGC 0.11 0.05 



Phe TTT 0.45 0.75 0.44 0.28 

Ser AGT 0.14 0.25 

Ser AGC 0.18 0.13 

Ser TCG o.05 o.08 

Ser TCA 0.18 0.19 

ser TCT 0.26 0.25 5:27 o'.H 



0.13 0.G7 

0.19 0.25 

0.06 0.13 

0.17 0.13 



0.17 0.24 
0.23 0,28 



Arg AGA 0.31 0.50 0.32 o.08 

Arg CGG 0.04 0.14 0.05 0 14 

Arg CGA 0.09 0.14 

Arg CGT 0.2 3 0.09 



0.09 0.04 
0.23 0.11 
0.09 0.3 6 



Gin CAG 0.38 0.18 0.39 0.43 

Gin CAA 0.62 0.82 0.61 0.57 

His CAT 0.52 0.90 0.50 0.38 

Hxs CAC 0.48 0.10 0.50 0.62 

TTG 0.26 0.08 0.27 0.15 

TTA 0.10 0.46 0.12 0.04 

Leu CTG 0.09 0.04 0.10 o 27 

Leu CTA 0.08 0.21 O.IO o'll 

CTT 0.29 0.15 0.18 o!l6 

CTC 0.19 0.06 0.22 0.27 

Pro CCG 0.07 0.20 0.08 0.20 

Pro CCA 0.44 0.56 0.44 0.39 

Pro CCT 0.32 0.24 0.32 0.19 

Pro CCC 0.16 0.00 0.16 o 22 



Bt coding sequences publicly available and 88 coding 
sequences of dicot nuclear genes were used to compile the 
codon usage table. The pooled dicot coding sequences, 
obtained from Genbank, were: 
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TABLE 1 (CONTINUED) 

GENUS/SPECIES 



GENBANK 



PROTEIN 



REP 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Antirrhinum majua 
ArAbldopal3 thallana 



Bertholletla exceisa 

BrAsalcs campeatrls 
Brasslca napus 
Brasalca oleacea 
Cana^/alla enslformls 
Carlca papaya 
Ch 1 amdomon a a 
relnhardtll 



Cucurblta pepo 
Cucumla aatlvua 



Daucua carota 

Dollchoa bltlorua 
Flaverla mlnervla 
Glyclno max 



55 



AMACHS Chalcone synthase 

ATHAOH Alcohol dehydroqenase 

ATHH3GA Hi stone 3 gene 1 

ATHH3GB Histone 3 gene 2 

ATHH4GA Histone 4 gene 1 

ATHLHCPl CAB 

ATHTUBA a tubulin 

5-enolpyruvyl4hif ate 

3-phosphate synthase 1 

High methionine storage 

protein 2 

Acyl carrier protein 3 

BNANAP Napin 

BOLSLSGR S-locus specific glycoprotein 

CENCONA Concanavalin A 

CPAPAP Papain 

CREC552 Preapocytochroroe 

CRERBCS 1 RuBPC smal 1 subunit gene 1 

CRERBCS2 RuBPC small subunit gene 2 

CUCPHT Phytochroroe 

CUSGMS Glyoxosomal malate synthetase 

CUSLHCPA CAB 

CUSSSU RuBPC small subunit 

DAREXT Extensin 

DAREXTR 33 kD extensin related protein 

DBILECS seed lectin 

FTRBCR RuBPC small subunit 

SOV7SAA 75 storage protein 

SOYACTIG Act in 1 

SOYCIIPI Cll protease inhibitor 

SOYGLYAIA Glycinin Ala Bx subunits 

SOYGLYAAB Glycinin A5A4B3 subunits 

SOYGLYAB Glycinin A3/b4 subunits 

SOYGLYR Glycinin A2Bla subunits 

SOYHSP175 Low M W heat shock proteins 
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15 



20 



25 



30 



35 



40 



46 



50 



TABLE 1 (CONTINUED) 

GENUS/SPECIES GENBANK 

SOYLGBI 

SOYLEA 

SOYLOX 

SOYNOD20G 

SOYNOD23G 

SOYNPD24H 

SOYNO026B 

SOYNOD26R 

SOYNOD2 7R 

SOYNOD3 5M 

SOYNOD7 5 

SOYNODRl 

SOYNODR2 

SOYPRPl 

SOYRUBP 

SOYURA 

SOYHSP26A 



Gossypxum hxrsutum 



Hellanthus annus HNNRUBCS 



Ipomoca batatas 
Lemna gUbba 

Luplnus luteus 

Lycoperslcon 

esculentum 



55 



LGXAB19 

LGIR5BPC 

LQPLBR 

TOHBIOBR 
TOMETHYBR 
TOKPG2AR 
TOMPSI 



PROTEIN 



REE 



Legheroog 1 ob i n 
Lectin 

Lipoxygenase 1 
20 kDa nodulin 

23 kDa nodulin 

24 kDa nodulin 
26 kDa nodulin 

26 kDa nodulin 

27 JtDa nodulin 
35 kDa nodulin 
7 5 kDa nodulin 
Nodulin C51 
Noduline E27 
Proline rich protein 
RuPBC small subunit 
Urease 

Heat shock protein 26A 
Nuclear-encoded chloroplaat 
heat shock protein 
2 2 kDA nodulin 
01 tubulin 
P2 tubulin 

Seed a globulin (vicilin) 

Seed 0 globulin (vicilin) 

RuBPC small subunit 

2S albumin seed storage 

protein 

Wound-induced catalase 
CAB 

RuPBC small subunit 
leghemoglobin 1 



5 
6 
6 
7 
7 



8 
9 



B iot in b i nd ing prot e i n 

Ethylene biosynthesis protein 

Polygalacturona8e-2a 

Tomato photosystem 1 protein 
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TABLE 1 (CONTINUED) 
GENUS/SPECIES GENBANK 



PROTEIN 



REF 



10 



15 



20 



25 



30 



35 



40 



45 



SO 



HedicAgo satxra, 
MesembryanthBmum 

crystalllnum 

HlcotlanA 

plumhaginlfolia. 



Hicotlana tabacum 



55 



Perseus americana 
Petrosal Inum 
hortense 



TOMRBCSA 

TOMRBCSB 

TOMRBCSC 

TOMRBCSD 

TOMRRD 

TOMWIPIG 

TOMWIPH 



ALFB3R 



TOBATP21 



TOBECH 

TOBGAPA 

TOBGAPB 

TOBGAPC 

TOBPRIAR 

TOBPRICR 

TOBPRPR 

TOBPXDLF 

TOBRBPCO 

TOBTHAUR 

AVOCEL 

PHOCHL 



RuBPC small subunit 
RuBPC small subunit 
RuBPC small subunit 
RuBPC small subunit 
Ripening related protein 
Wound induced proteinase 
inhibitor I 

Wound induced proteinase 

inhibitor II 

CAB lA 

CAB IB 

CAB 3C 

CAB 4 

CAB 5 

Leghemoglobin III 
RuBPC small subunit 



10 
10 
10 
11 
11 
11 

12 



Mitochondrial ATP synthase 
subunit 

Nitrate reductase 13 
Glutamine synthetase 14 
Endochit inase 

A subunit of chloroplast G3PD 

B subunit of chloroplast G3PD 

C subunit of chloroplast G3PD 

Pathogenesis related protein la 

Pathogenesis related protein Ic 

Pathogenesis related protein lb 

Peroxidase 

RuBPC small subunit 

TKV- induced protein homologous 

to thaumatin 

Cellulase 

Chalcone synthase 
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TABLE 1 (CONTINUED) 

GENUS/SPECIES GENBANK 



PetuiiXa sp 



10 



15 



20 



25 



30 



PETCAB13 

PETCAB22L 

PETCAB22R 

PETCAB25 

PETCAB37 

PETCAB91R 

PETCHSR 

PETGCRl 

PETRBCSOd 

PETRBCSll 



Phas0olus vulgaris PHVCHM 

PHVDLECA 

PHVDLECB 

PHVGSRl 

PHVGSR2 

PHVLBA 

PHVLECT 

PHVPAL 

PHVPHASAR 

PHVPHASBR 



35 



Pisum sativum 



40 



45 



50 



PEAALB2 

PEACAB80 

PEAGSRl 

PEALECA 

PEALEGA 

PEARUBPS 

PEAVIC2 

PEAVIC4 

PEAVIC7 



55 



PROTEIN 



CAB 13 
CAB 22L 
CAB 22R 
CAB 25 
CAB 37 
CAB 91R 

Chalcone synthase 

Glycine-rich protein 

RuPBC small subunit 

RuPBC small subunit 

70 kOA heat shock protein 15 

Chit inase 

Phytohemagglutinin E 
Phytohemagglutinin L 
Glutamine synthetase 1 
Glutamine synthetase 2 
Leghemog lob i n 
Lectin 

Phenylalanine ammonia lyase 
a phaseolin 
/3 phaseolin 

Arcelin seed protein is 
Chalcone synthase 17 
Seed albumin 
CAB 

Glutamine synthetase (nodule) 

Lectin 

Legumin 

RuBPC small subunit 

Vicilin 

Vicilin 

Vicilin 

Alcohol dehydrogenase 1 18 
Glutamine synthetase (leaf) 19 
Glutamine synthetase (root) 19 
Histone 1 20 
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TABLE 1 (CONTINUED) 

GENUS/SPECIES GENBANK 



PROTEIN 



RET 



10 



15 



20 



25 



30 



35 



40 



45 



RaphAjius aatlvus 

Ricinus communis RCCAGC 

RCCRICIN 
RCCICL4 

Sllene pratenals SIPFDX 

SIPPCY 

Sinapis alba SALGAPDH 
Solanum tuberosum POTPAT 

POTINHWI 

POTLSIG 

POTP12G 



POTRBCS 

Spxnacj.a oleracea SPIACPI 

SPIOEC16 

SPIOEC23 

SPIPCG 
SPIPS33 



Vicia faba 



VFALBA 
VFALEB4 



Nuclear encoded chloroplast 4 

heat shock protein 

RuPBC smallsubunit 21 

Agglutinin 

Ricin 

Isocitrate lyase 
Fer redox in precursor 
Plastocyanin precursor 
Nuclear gene for G3PD 
Patatin 

Wound-induced proteinase 
inhibitor 

Light-inducible tissue specific 
ST-LSl gene 

Wound* induced proteinase 

inhibitor II 

RuBPC small subunit 

Sucrose synthetase 22 

Acyl carrier protein I 

16 kDa photosynthet ic 

oxygen-evolving protein 

23 kDa photosynthetic 

oxygen-evolving protein 

Plastocyanin 

33 IcDa photosynthetic water 
oxidation complex precursor 
Glycolate oxidase 23 
Leghemoglobin 
Legumin B 

Vicillin 24 



50 



Pooled 53 monocot coding sequences obtained from Genbank (release 55) 
or, when no Genbank file name is specified, directly from the published 
source, were: 



55 
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TABLE 1 (CONTINUED) 
GENUS/SPECIES GENBANK 



Avena satlva 
Hordeum vul gare 



10 



15 



ASTAP3R 

BLYALR 

BLYAMYl 

BLYAMY2 

BLYCHORDl 

BLYGLUCB 

BLYHORB 

BLYPAPI 

BLYTHIAR 

BLYUBIQR 



20 



25 



Oryza satlva 



RICGLUTG 



30 



35 



40 



45 



50 



Trltlcum aestlvum WHTAMYA 

WHTCAB 
WHTEMR 
WHTGIR 
WHTGLGB 
WHTGLIABA 
WHTGLUTI 
WHTH3 
WHTH4091 
WHTRBCB 

SecalB cereale RYESECGSR 
Zea mays MZEAIG 

MZEACTIG 

MZEADHllF 

MZEA0H2NR 

MZEALD 

M2EANT 

MZEEG2R 



PROTEIN 



_REr 



Phytochrome 3 
Alcurain 
a amylase 1 
a amylase 2 
Horde! n C 
fi glucanase 
Bl hordein 

Amylase/protease inhibitor 
Toxin a hordothionin 
Ubiquitin 

Histone 3 25 
Leaf specific thionin 1 26 
Leaf specific thionin 2 26 
Plastocyanin 27 
Glutelin 

Glutelin 28 

a amylase 

CAB 

Em protein 

gibberellin responsive protein 
7 gliadin 

a/fi gliadin Class All 
High MW glutenin 
Histone 3 
Histone 4 

RuBPC small subunit 
7 secalin 

40.1 JcDA Al protein (NADPH- 
dependent reductase) 
Actin 

Alcohol dehydrogenase 1 
Alcohol dehydrogenase 2 
Aldolase 

ATP/ADP trans locator 
Glutelin 2 



55 
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TABLE 1 (CONTINUED) 

GENUS/SPECIES GENBANK 



PROTEIN 



10 



15 



20 



25 



30 



MZEGGST3B 
MZEH3C2 
MZEH4C14 
MZEHSP701 

MZEHSP702 

MZELHCP 

MZEMPL3 

MZEPEPCR 

MZERBCS 

MZESUSVSG 

MZETP12 

MZEZEA20M 

MZEZEA30M 

MZEZE15A3 

MZEZE16 

MZEZE19A 

MZEZE22A 

MZEZE22B 



35 



Glutathione S transferase 
Hiatone 3 
Histone 4 

70 kD Heat shock protein, 
exon 1 

70 kD Heat shock protein, 

exon 2 

CAB 

Lipid body surface protein L3 
Phosphoenolpyruvate carboxylase 
RuPBC small subunit 
Sucrose synthetase 
Triosephosphate isomerase 1 
19 kD zein 
19 kD zein 

15 kD zein 

16 kD zein 
19 kD zein 
22 kd zein 
22 kD zein 

Catalase 2 29 
Regulatory CI locus 30 



40 



45 



50 
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Table l (CONTINUED) 

th?"?olTo"nrgenlt'"°StTai^"\^ of coding sequences 

New York, pp, 85-99); fit var. kur stale i hd-i 1 c: 
fragment (Schnepf and Whiteley /l98S\ j n?"^! ^ 
Jfifl: 6273-6280); and Bt var. 0 HiS??; 



1. Klee, H.J. et al- (1987) Mol . Gen. Genet. 2ifl:437- 

2. Altenbach, S.B. et al. (1987) Plant Mol. Biol. 8:239- 

Rose, R.E. al. (1987) Nucl. Acids Res. iSj7i97. 
Vierling, E. ££ ai- (1988) EMBO J. 7:575-581. 
Sandal, N.N. £t ai- (1987) Nucl. Acids Res. 15:1507- 

Tingey, S.V. et al. (1987) EMBO J. 6:1-9. 
Chlan, C-A. et ai. (1987) Plant Mol. Biol. 2:533-546. 
Allen, R.D. et al. (1987) Mol. Gen. Genet. 21fi:211- 

sakajo, S. ei al- (1987) Eur. J. Biochem, l&5:437-442, 
Pirersky. E. gt ai. (1987) Plant Mol. Biol. 2:109-120. 
Ray, J. et ai. (1987) Nucl. Acids Res. 15:10587. 
DeRocjer, E.J. et ai. (1987) Nucl. Acids Res. 1^:6301. 
Calza, R. al. (1987) Mol. Gen. Genet. 202:552-562. 
8i?fl6-37V.^* Coruzzi, G.M. (1987) Plant Phys. 

Winter, J. et ai. (1988) Mol. Gen. Genet. 211:315-319. 
Osborn, T.C. et ai. (1988) Science ii2:207-210. 



5, 

6, 
7. 
8. 

9. 
10, 
11. 
12. 
13. 
14. 

15. 
16. 
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Table 1 (CONTINUED) 

17. Ryder, T.B. et aj,. (1987) Mol . Gen. Genet. 210 :219- 
233 . 

18. Llewellyn, D.J. et al, (1987) J. Mol . Biol. 195:115- 
123, 

19. Tingey, S.v. et al. (1987) EMBO J. 4:1-9. 

20. . Gantt, J.S. and Key, J.L. (1987) Eur. J. Biochein. 

166 : 119-125. 

15 21. Guidet, F. and Fourcroy, P. (1988) Nucl . Acids Res, 
16:2336. 

22, Salanoubat, M, and Belliard, G. (1987) Gene 60:47-56. 



10 



20 



25 



30 



35 



2 3 . Volokita, and Soroerville, C.R. (1987) J . Biol . 

Chem. 262 : 1582 5-1582 8 . 

24, Bassner, R. et al. (1987) Nucl. Acids Res. 15:9609. 

25, ChojecKi, J. (1986) Carlsberg Res. Commun. ^1:211-217. 

26, Bohlmann, H. and Apel, K. (1987) Mol. Gen. Genet. 
207 S446-4S4. 

27, Nielsen, P.S. and Gausing, K. (1987) FEBS Lett. 
22S:159-162. 

28, Higuchi, W. and Fukazawa, C. (1987) Gene 5S:245-253. 

29, Bethards, L.A. et al. (1987) Proc. Natl. Acad. Sci. 
USA M:6S30-6834. 

30, Paz-Ares, J- et al. (1987) EMBO J. 6:3553-3558- 



40 

For example, dicots utilize the AAG codon for lysine with a frequency of 61% and the AAA codon with a fre- 
quency of 39%. In contrast, in Bt proteins the lysine codons AAG and AAA are used with a frequency of 1 3% 
and 87%, respectively. It is known in the art that seldom used codons are generally detrimental to that system 

45 and must be avoided or used judiciously. Thus, in designing a synthetic gene encoding the Btt crystal protein, 
individual amino acid codons found In the original Btt gene are altered to reflect the codons preferred by dicot 
genes for a particular amino acid. However, attention is given to maintaining the overall distribution of codons 
for each amino acid within the coding region of the gene. For example, in the case of alanine, it can be seen 
from Table 1 that the codon GCA is used in Bt proteins with a frequency of 50%, whereas the codon GCT is 

50 the preferred codon in dicot proteins. In designing the synthetic Btt gene, not all codons for alanine in the orig- 
inal Bt gene are replaced by GCT; Instead, only some alanine codons are changed to GCT while others are 
replaced with different alanine codons in an attempt to preserve the overall distribution of codons for alanine 
used in dicot proteins. Column C In Table 1 documents that this goal is achieved; the frequency of codon usage 
in dicot proteins (column A) corresponds very dosely to that used in the synthetic Btt gene (column C). 

55 In similar manner, a synthetic gene coding for insecticidal crystal protein can be optimized for enhanced 

expression in monocot plants. In Table 1. column D, is presented the frequency of codon usage of highly ex- 
pressed monocot proteins. 

Because of the degenerate nature of the genetic code, only part of the variation contained in a gene is 
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10 



15 



20 



expressed in this protein. It is clear that variation between degenerate base frequencies is not a neutral phe- 
nomenon since systematic codon preferences have been reported for bacterial, yeast and mammalian genes. 
Analysis of a large group of plant gene sequences indicates that synonymous codons are used differently by 
monocots and dicots. These patterns are also distinct from those reported forE. coli, yeast and man. 

In general, the plant codon usage pattern more closely resembles that of man and other higher eukaryotes 
than unicellular organisms, due to the overall preference for G+C content in codon position III. Monocots in 
this sample share the most commonly used codon for 13 of 18 amino acids as that reported for a sample of 
human genes (Grantham et al. (1986 supra) , although dicots favor the most commonly used human codon in 
only 7 of 18 amino acids. 

Discussions of plant codon usage have focused on the differences between codon choice in plant nuclear 
genes and in chloroplasts. Chloroplasts differ from higher plants in that they encode only 30 tRNA species. 
Since chloroplasts have restricted their tRNA genes, the use of preferred codons by chloroplast-encoded pro- 
teins appears more extreme. However, a positive correlation has been reported between the level of isoac- 
cepting tRNA for a given amino acid and the frequency with which this codon is used in the chloroplast genome 
(Pfitzinger et a[. (1987) Nucl. Acids Res. 15:1377-1386). 

Our analysis of the plant genes sample confirms earlier reports that the nuclear and chloroplast genomes 
in plants have distinct coding strategies. The codon usage of monocots in this sample is distinct from chloro- 
plast usage, sharing the most commonly used codon for only 1 of 18 amino acids. Dicots in this sample share 
the most commonly used codon of chloroplasts in only 4 of 18 amino acids. In general, the chloroplast codon 
profile more closely resembles that of unicellular organisms, with a strong bias towards the use of A+T in the 
degenerate third base. 

In unicellular organisms, highly expressed genes use a smaller subset of codons than do weakly expressed 
genes although the codons preferred are distinct in some cases. Sharp and Li (1 986) Nucl. Acids Res. 14: 7734- 
7749 report that codon usage in 165 E. coli genes reveals a positive correlation between high expression and 
increased codon bias. Bennetzen and Hall (1982) supra have described a similar trend in codon selection in 
yeast, Codon usage in these highly expressed genes correlates with the abundance of isoaccepting tRNAs in 
both yeast and E coli. It has been proposed that the good fit of abundant yeast and E. coli mRNA codon usage 
to isoacceptor tRNA abundance promotes high translation levels and high steady state levels of these proteins. 
This strongly suggests that the potential for high levels of expression of plant genes in yeast or E. coli is limited 
by their codon usage. Hoekema et al. (1 987) supra report that replacement of the 25 most favored yeast codons 
with rare codons in the 5' end of the highly expressed gene PGK1 isads to a decrease in both mRNA and pro- 
tein. These results indicate that codon bias should be emphasized when engineering high expression of foreign 
genes in yeast and other systems. 

^5 (iii) Sequences within the Btt coding region having potentially destabilizing influences . 

Analysis of the Btt gene reveals that the A + T content represents 64% of the DNA base composition of 
the coding region. This level of A + T is about 10% higher than that found in a typical plant coding region. Most 
often, high A + T regions are found in intergenic regions. Also, many plant regulatory sequences are observed 
to be AT-rich. These observations lead to the consideration that an elevated A+T content within the Btt coding 
region may be contributing to a low expression level in plants. Consequently, in designing a synthetic Btt gene, 
the A+ Tcontent is decreased to more closely approximate the A+ TIevels found in plant proteins. As illustrated 
in Table 3, the A + T content Is lowered to a level in keeping with that found in coding regions of plant nuclear 
genes. The synthetic Btt gene of this invention has an A + T content of 55%. 
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Table 3. 



55 



Adenine + Thymine Content in Btt Coding Region 




Base 


%G+C 


%A+T 


Coding region 


G 


A 


T 


C 


Natural Btt gene 
Synthetic Btt gene 


341 
392 


633 
530 


514 
483 


306 
428 


36 
45 


64 

55 



In addition, the natural Btt gene is scanned for sequences that are potentially destabilizing to Btt RNA. 
These sequences, when identified in the original Btt gene, are eliminated through modification of nucleotide 
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sequences. Included in this group of potentially destabilizing sequences are: 

(a) plant polyadenylation signals (as described by Joshi (1987) Nud. Acids Res. 15:9627-9640). In eukar- 
yotes. the primary transcripts of nuclear genes are extensively processed (steps including 5* - capping, 
intron splicing, polyadenylation) to form mature and translatable mRNAs. In higher plants, polyadenylation 

5 involves endonudeotylic deavage at the polyA site followed by the addition of several A residues to the 

deaved end. The selection of the polyA site is presumed to be cis-regulated. During expression of Bt protein 
and RNA in different plants, the present inventors have observed that the polyadenylated mRNAlsolated 
from these expression systenros is not full-length but instead is truncated or degraded. Hence, in the present 
invention it was decided to minimize possible destabilization of RNA through elimination of potential poly- 

10 adenylation signals within the coding region of the synthetic Btt gene. Plant polyadenylation signals in- 

duding AATAAA, AATGAA. AATAAT, AATATT, GATAAA. GATAAA, and AATAAG motifs do not appear in 
the synthetic Btt gene when scanned for 0 mismatches of the sequences. 

(b) polymerase II termination sequence, CAN7_9AGTNNAA. This sequence was shown (Vankan and Fili- 
powicz (1988) EMBO J. 7:791-799) to be next to the 3' end of the coding region of the U2 snRNA genes 

^5 of Arabidopsis thaliana and is believed to be important for transcription termination upon 3* end processing. 

The synthetic Btt gene is devoid of this termination sequence. 

(c) CUUCGG hairpins, responsible for extraordinarily stable RNA secondary structures associated with 
various biochemical processes (Tuerk et aJ. (1988) Proc. Natl. Acad. Set. 85:1364-1368). The exceptional 
stability of CUUCGG hairpins suggests that they have an unusual structure and may function in organizing 

20 the proper folding of complex RNA structures. CUUCGG hairpin sequences are not found with either 0 or 

1 mismatches in the Btt coding region. 

(d) plant consensus splice sites, 5' = AAG:GTAAGT and 3* = TTTT(Pu)TTT(Pu)T(Pu)T(Pu)T(Pu)TGCAG:C. as 
described by Brown et a[. (1 986) EMBO J. 5:2749-2758, Consensus sequences for the 5* and 3' splice junc- 
tions have been derived from 20 and 30 plant intron sequences, respectively. Although it is not likely that 

25 such potential splice sequences are present in Bt genes, a search was initiated for sequences resembling 

plant consensus splice sites in the synthetic Btt gene. For the 5' splice site, the closest match was with 
three mismatches. This gave 12 sequences of which two had G:GT. Only position 948 was changed be- 
cause 1323 has the Kpnl site needed for reconstruction. The 3'-splice site is not found in the synthetic Btt 
gene. 

30 Thus, by highlighting potential RNA-destabilizing sequences, the synthetic Btt gene is designed to elim- 

inate known eukaryotic regulatory sequences that effect RNA synthesis and processing. 

Example 2 . Chemical synthesis of a modified Btt structural gene 

^5 (i) Synthesis Strategy 

The general plant for synthesizing linear double-stranded DNA sequences coding for the crystal protein 
from Btt is schematically simplified in Figure 2. The optimized DNA coding sequence (Figure 1) is divided into 
thirteen segments (segments A-M) to be synthesized individually, isolated and purified. As shown in Figure 2, 

40 the general strategy begins by enzymatically joining segments A and M to form segments AM to which is added 
segment BL to form segment ABLM. Segment CK is then added enzymatically to make segment ABCKLM which 
is enlarged through addition of segments DJ, El and RFH sequentially to give finally the total segment ABC- 
DEFGHIJKLM, representing the entire coding region of the Btt gene. 

Figure 3 outlines in more detail the strategy used in combining individual DNA segments in order to effect 

^ the synthesis of a gene having unique restriction sites integrated into a defined nucleotide sequence. Each of 
the thirteen segments (A to M) has unique restriction sites at both ends, allowing the segment to be strategically 
spliced into a growing DNA polymer. Also, unique sites are placed at each end of the gene to enable easy trans- 
fer from one vector to another. 

The thirteen segments (A to M) used to construct the synthetic gene vary in size. Oligonudeotide pairs 

^ of approximately 75 nudeotides each are used to construct larger segments having approximately 225 nu- 
cleotide pairs. Figure 3 documents the number of base pairs contained within each segment and specifies the 
unique restriction sites bordering each segment. Also, the overall strategy to incorporate specific segments 
at appropriate splice sites is detailed in Figure 3. 

55 (n) Preparation of oligodeoxynudeotides 

Preparation of oligodeoxynudeotides for use in the synthesis of a DNA sequence comprising a gene for 
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BU is carried out according to the general procedures described by MatteuccI et af. (1981) J. Am. Chem. Soc 
103:3185-3192 and Beaucage et al. (1981) Tetrahedron Lett. 22:1859-1862. All oligonucleotides are prepared 
by the solid-phase phosphoramidite triester coupling approach, using an Applied Biosystems Mode! 380ADNA 
synthesizer. Deprotection and cleavage of the oligomers from the solid support are 'carried out according to 
5 standard procedures. Crude oligonucleotide mixtures are purified using an oligonucleotide purification car- 
tridge (OTC, Applied Biosystems) as described by McBride et al. (1988) Biotechniques 6:362-367. 

5'-phosphorylation of oligonucleotides Is performed with T4 polynucleotide kinase. The reaction contains 
2^g oligonucleotide and 18.2 units polynucleotide kinase (Pharmacia) in linker kinase buffer (Maniatis (1982) 
Cloning Manual, Fritsch and Sambrook (eds.). Cold Spring Harbor Laboratory. Cold Spring Harbor, NY) The 
10 reaction is incubated at 37°C for 1 hour. 

Oligonucleotides are annealed by first heating to 95X for 5 min. and then allowing complementary pairs 
to cool slowly to room temperature. Annealed pairs are reheated to 65=>C, solutions are combined, cooled slowly 
to room temperature and kept on ice until used. The ligated mixture may be purified by electrophoresis through 
a 4% NuSieve agarose(FMC) gel. The band corresponding to the ligated duplex is excised, the DNA is extracted 
15 from the agarose and ethanol precipitated. 

Ligations are earned out as exemplified by that used in M segment ligations. M segment DNA is brought 
to 65°C for 25 min, the desired vector is added and the reaction mixture is incubated at 65''C for 15 min. The 
reaction is slow cooled over 1-1/2 hours to room temperature. ATP to 0.5mM and 3.5 units of T4 DNA ligase 
salts are added and the reaction mixture is incubated for 2 hr at room temperature and then maintained over- 
20 night at 15'^C. The next morning, vectors which had not been ligated to M block DNA were removed upon lin- 
earization by EcoRI digestion. Vectors ligated to the M segment DNA are used to transform E. coli MC1061. 
Colonies containing inserted blocks are identified by colony hybridization with 32p_ labelled oligonucleotide 
probes. The sequence of the DNA segment is conf imned by isolating plasmid DNA and sequencing using the 
dideoxy method of Sanger et al. (1977) Proc. Natl. Acad. Sci. 74:5463-5467. 
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(iii) Synthesis of Segment AM 

Three oligonucleotide pairs (Al and its complementary strand Ale, A2 and A2c and A3 and A3c) are as- 
sembled and ligated as described above to make up segment A. The nucleotide sequence of segment A is as 
30 follows: 
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In Table 4, bold lines demarcate the individual oligonucleotides. Fragment A1 contains 71 bases* A1c has 76 
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bases. A2 has 75 bases, A2c has 76 bases, A3 has 82 bases and A3c has 76 bases. In all, segment A is com- 
posed of 228 base pairs and is contained between EcoRI restriction enzyme site and one destroyed EcoRI 
site (5')J. (Additional restriction sites within Segment A are indicated.) The EcoRI single-stranded coh^ve 
ends allow segment A to be annealed and then ligated to the EcoRI-cut cloning vector, plC20K. 

Segment M comprises three oligonucleotide pairs: Ml, 80 bases, MIc, 86 bases, M2, 87 bases, M2c, 87 
bases. M3. 85 bases and M3c 79 bases. The individual oligonucleotides are annealed and ligated according 
to standard procedures as described above. The overall nucleotide sequence of segment M is: 
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In Table 5 bold lines demarcate the individual oligonucleotides. Segment M contains 252 base pairs and has 
destroyed Eco RI. restriction sites at both ends. (Additional restriction sites within segment M are indicated). 
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Segment M is inserted into vector plC20R at an EcoRI restriction site and cloned. 

As proposed in Figure 3, segment M is joined to segment A in the plasmid in which it is contained Segment 
M IS excised at the flanking restrictions sites from its cloning vector and spliced into plC20K, harboring segment 
A, through successive digestions with Hindlll followed by M". The plC20K vector now comprises segment A 
joined to segment M with a Hindlll site at the splice site (see Figure 3). Placmid plC20K is derived from plC20R 
by removing the Scal-Ndel DNA fragment and inserting a Hindi fragment containing an NPTI coding region 
The resulting plasmid of 4.44 kb confers resistance to kanamycin on E. coll . 

Example 3. Expression of synthetic crystal protein gene in bacterial systems 

The synthetic Btt gene is designed so that it is expressed in the plC20R-kan vector in which it is construct- 
ed. This expression is produced utilizing the initiation methionine of the lacZ protein of plC20K. The wild-type 
Btt crystal protein sequence expressed in this manner has full insecticidal activity. In addition, the synthetic 
gene is designed to contain a BamHI site 5' proximal to the initiating methionine codon and a Bglll site 3' to 
the terminal TAG translation stop codon. This facilitates the cloning of the insecticidal crystal protein coding 
region into bacterial expression vectors such as pDR540 (Russell and Bennett. 1982). Plasmid pDR540 con- 
tains the TAG promoter which allows the production of proteins including Btt crystal protein under controlled 
conditions in amounts up to 10% of the total bacterial protein. This promoter functions in many gram-negative 
bacteria including E. coH and Pseudomonas . 

Production of Bt insecticidal crystal protein from the synthetic gene in bacteria demonstrates that the pro- 
tein produced has the expected toxicity to coleopteran insects. These recombinant bacterial strains in them- 
selves have potential value as microbial insecticides, product of the synthetic gene. 

Example 4 . Expression of a synthetic crystal protein gene In pla nts 

The synthetic Btt crystal protein gene is designed to facilitate cloning into the expression cassettes These 
utilize sites compatible with the BamHI and Bgll! restriction sites flanking the synthetic gene. Cassettes are 
available that utilize plant promoters including CaMV 35S, CaMV 19S and the ORF 24 promoter from T-DNA 
These cassettes provide the recognition signals essential for expression of proteins in plants. These cassettes 
are utilized in the micro Ti plasmids such as pH575. Plasmids such as pH575 containing the synthetic Btt gene 
directed by plant expression signals are utilized in disarmed Agrobacterium tumefaciens to introduce the syn- 
thetic gene into plant genomic DNA. This system has been described previously by Adang et §1. (1987) to ex- 
press Bt var. kurstaki crystal protein gene in tobacco plants. These tobacco plants were toxic to feeding tobacco 
hornworms. 

Example 5 . Assay for insecticidal activity 

Bioassays were conducted essentially as described by Sekar, V. et al. supra . Toxicity was assessed by an 
estimate of the LDgo- Plasmids were grown in E. colt JM105 (Yanisch-Perron, C. et al. (1985) Gene 33:103- 
119). On a molar basis, no significant differences in toxicity were observed between crystal proteins encoded 
by p544Pst-Met5, p544-Hindlll, and pNSBP544. When expressed in plants under identical conditions, cells 
containing protein encoded by the synthetic gene were observed to be more toxic than those containing protein 
encoded by the native Btt gene. Immunoblots ("western" blots) of cell cultures indicated that those that were 
more toxic had more crystal protein antigen. Improved expression of the synthetic Btt gene relative to that of 
a natural Btt gene was seen as the ability to quantitate specific mRNA transcripts from expression of synthetic 
Btt genes on Northern blot assays. 



Claims 



1. A synthetic gene designed to be highly expressed in plants comprising a DNA sequence encoding an in- 
secticidal protein which is functionally equivalent to a native insecticidal protein of Bt. 

2. A synthetic gene of claim 1 wherein said DNA sequence is at least about 85% homologous to a native 
55 insecticidal protein gene of Btt. 

3. A synthetic gene of claim 1 wherein said DNAsequence is that presented in Figure 1 , spanning nucleotides 
1 through 1793. 
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4. A synthetic gene of daim 1 wherein said DNA sequence is that presented in Figure 1 spanning nucleotides 
1 through 1833. 

5. A synthetic gene of daim 1 wherein the overall frequency of preferred codon usage within the entire coding 
5 region of said synthetic gene is within about 75% of the frequency of codon usage preferred in plants, 

6. A synthetic gene of claim 1 wherein the A+T base content of said DNA sequence is substantially equal to 
the A+T base content found in plant structural genes. 

10 7. A synthetic gene of daim 1 wherein a plant initiation sequence is present at the 5* end of the coding region. 

8. A synthetic gene of daim 1 wherein plant polyadenyla-tion signals, comprising those having AATAAA, 
AATGAA, AATAAT, AATATT, GATAAA, GATAAA and AATAAG motifs, are eliminated in said DNA se- 
quence. 

15 

9. A synthetic gene of daim 1 wherein the polymerase II tenmination sequence, CANy.gAGTNNAA, is elim- 
inated in said DNA sequence. 

10. A synthetic gene of daim 1 wherein CUUCGG hairpins are eliminated in said DNA sequence. 

11. A synthetic gene of daim 1 wherein plant consensus splice sites, induding 5'=AAG:GTAAGT and 3*- 
=TTTT(Pu)TTT(Pu)T(Pu)T(Pu)T(Pu)TGCAG;C. are eliminated in said DNA sequence. 

12. A synthetic gene of daim 1 wherein the CG and TA doublet avoidance indices are substantially equal to 
25 that of highly expressed genes in the selected host plant 

13. A recombinant DNA cloning vector comprising said synthetic gene of daim 1 . 

14. A plant cell which contains the synthetic gene of claim 1. 



20 



30 



35 



40 



45 



50 



55 



15. An improved method of producing a protein toxic to an insect comprising the step of introducing into a 
host plant cell a DNA segment comprising a synthetic gene designed to be highly expressed in plants com- 
prising a DNA sequence encoding an insecticidal protein which is functionally equivalent to a native in- 
secticidal Protein of Bt such that said synthetic gene is expressed in said plant host. 
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CODING REGION OF SYNTHETIC BH GENE 
DIVIDED INTO 13 SEGMENTS 



SEGMENT A 

j SEGMENT M 
SEGMENT AM 

j SEGMENT BL 
SEGMENT ABLM 

I SEGMENT CK 
SEGMENT ABCKLM 

j SEGMENT DJ 
SEGMENT ABCDJKLM 

j SEGMENT El 
SEGMENT ABCDEIJKLM 

1 SEGMENT FGH 



SEGMENT F 

j SEGMENT H 
SEGMENT FH 

SEGMENT G 



SEGMENT ABCDEFGHIJKLM 



FIG. 2 
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